Author Profiling for English and Spanish Text Notebook for PAN at CLEF 2013
نویسندگان
چکیده
This paper describes an approach for the author profiling task of the PAN 2013 challenge. This work is based on the idea of linguistic modality that has been successfully used in other classification tasks such as authorship attribution. We consider three different modalities: syntactic, stylistic, and semantic, each representing a different aspect of text. For each modality, we extract informative meta features by computing the similarity relations between the feature vectors in the test files and the centroids of modality specific clusters. Since we were provided texts in both Spanish and English, we build a language independent framework for author profiling. For both English and Spanish documents, our system performed well for the age identification task. For gender prediction, although our system could not perform as expected for English, it yielded good results on Spanish.
منابع مشابه
Author Profiling Using Style-based Features Notebook for PAN at CLEF 2013
In this paper, we present a method for profiling the author of an anonymous text. Our approach is based on learning the author profile with a focus on dimensions age and gender. Our system takes as input a document which is written in English or in Spanish and generates the age and the gender of its author. First, we computed a ranked list of words that occur in the corpus and we grouped them i...
متن کاملUniNE at CLEF 2015 Author Profiling: Notebook for PAN at CLEF 2015
This paper describes and evaluates an effective author profiling model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, Italian, and Spanish) in Twitter tweets. As features, we suggest using the 200 most frequent terms of the query text (isolated words and punctuation symbols). Applying a simple distance measure and loo...
متن کاملXRCE Personal Language Analytics Engine for Multilingual Author Profiling: Notebook for PAN at CLEF 2015
This technical notebook describes the methodology used – and results achieved – for the PAN 2015 Author Profiling Challenge by the team from Xerox Research Centre Europe (XRCE). This year, personality traits are introduced alongside age and gender in a corpus of tweets in four languages – English, Spanish, Italian and Dutch. We describe a largely language agnostic methodology for classification...
متن کاملAuthor Profiling using LDA and Maximum Entropy Notebook for PAN at CLEF 2013
This paper describes the traditional authorship attribution subtask of the PAN/CLEF 2013 workshop. In our attempt to classify the documents based on gender and age of an author, we have applied a traditional approach of topic modeling using Latent Dirichlet Allocation[LDA]. We used the content based features like topics and style based features like preposition-frequencies, which act as the eff...
متن کاملReadability for Author Profiling? Notebook for PAN at CLEF 2013
This paper briefly describes the approach taken to the Author Profiling task at PAN 13. It describes the simple features used, and the origins in thinking around text readability as a mechanism for identification, and the predictive model used which may have beneficially omitted classes, as well as offering commentary on the results obtained.
متن کامل